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measured using the radius of gyration, Rq, follows the Flory scaling law, namely. 
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■ Determination of sizes and flexibilities of RNA molecules is important in under- 

o . 

, standing the nature of packing in folded structures and in elucidating interactions 

O 

"q I between RNA and DNA or proteins. Using the coordinates of the structures of 

■ RNA in the Protein Data Bank we find that the size of the folded RNA structures, 

> 

^ I Rq = 5.5N^^^ A where N is the number of nucleotides. The shape of RNA molecules 

' is characterized by the asphericity A and the shape S parameters that are computed 

using the eigenvalues of the moment of inertia tensor. From the distribution of A, 
we find that a large fraction of folded RNA structures are aspherical and the distri- 
bution of S values shows that RNA molecules are prolate {S > 0). The flexibility 
of folded structures is characterized by the persistence length Ip. By fitting the 
distance distribution function, P{r) that is computed using the coordinates of the 
folded RNA, to the worm- like chain model we extracted the persistence length Ip. 
We find that Ip ~ l.SA^'^'^^ A which might reflect the large separation between the 
free energies that stabilize secondary and tertiary structures. The dependence of Ip 
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on N implies the average length of helices should increases as the size of RNA grows. 
We also analyze packing in the structures of ribosomes (308, 508, and 708) in terms 
of Rg, a, S, and Ip. The 70S and the 508 subunits are more spherical compared to 
most RNA molecules. The globularity in 508 is due to the presence of an unusually 
large number (compared to 308 subunit) of small helices that are stitched together 
by bulges and loops. Comparison of the shapes of the intact 708 ribosome and the 
constituent particles suggests that folding of the individual molecules might occur 
prior to assembly. 

INTRODUCTION 

Molecular recognition between RNAs or RNA and protein is involved in a number of cellular 
functions. In all these processes RNA interacts with other biomolecules. In order to under- 
stand the biophysical basis of interactions of RNA with other biological molecules it is necessary 
to characterize the shapes of the interacting partners. Hence, it is important to elucidate the 
shapes and flexibilities of RNA structures. The large increase in the number of three dimen- 
sional structures allows us to quantify RNA shapes which is needed to describe the assembly of 
complexes such as the ribosome. 

In contrast to the situation in RNA much is known about packing and shape fluctuations 
in proteins . ^i^i^i"^ In part this is because the number of solved protein structures is ~30,000 
while the RNA structure database contains only ~600 structures. Despite considerable suc- 
cess in the secondary structure predictions of nucleic acid sequences using energy minimization 
dynamic programming algorithn>^ or comparative sequence analysi^S. the complicated nature 
of counterion-mediated tertiary interactions in RNAs makes it difficult to obtain three dimen- 
sional RNA structures using computational methods. The recent experimental determination 
of medium to large size of RNA structures has prompted us to perform a statistical analysis of 
RNA structures with the aim of characterizing their shapes and flexibility. 

In this paper, we study the structural features of RNA using the currently available RNA 
three dimensional structures.® The size of RNA, as measured by the radius of gyration Rg, 
shows that typically RNA molecules are compact. The variation of Rg with the number (A^) 
of nucleotides obeys Flory law i.e., Rg = aN^^^ A. Although the overall scaling law for Rg for 
RNA is identical to that for proteins there are considerable differences in their shapes. We flnd 
that the folded states of RNAs are largely prolate and are considerably more aspherical than 
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proteins. The flexibihty of RNA, which is crucial in describing interactions with proteins and 
RNA and DNA, is described in terms of the persistence length (Ip) which can be measured using 
X-ray scattering^ and other methods.-^ The values of Ip for RNA, which are considerably larger 



also describe the unusual structural characteristics of the ribosome, a large ribonucleoprotein 
complex. 

METHODS 

RNA structures : We computed several quantities to characterize the shapes of RNA 
using the atomic coordinates of their structures determined by X-ray crystallography, NMR, or 
cryo-EM. The coordinates for all RNA structures were obtained from the Protein Data Bank 
(PDB).^ Our analysis is performed for over 1185 individual RNA chains with the number of 
nucleotides > 10 found in 642 RNA related PDB files as of June 2005. Among these, 195 
RNA chain structures are monomers, and the rest of the chains are part of oligomers or appear 
in complexes with other RNA molecules or proteins. Structural features in the monomeric form 
can be different from those determined in an oligomer or complex because the intermolecular 
interaction can affect the individual chain structure. Therefore, we analyzed the two groups 
of structures separately. For comparison, we have also calculated shape characteristics for a 
dataset of proteins. The results for proteins enable us to assess certain unusual features of 
RNA-protein interactions especially in the ribosome. 

Size : The radius of gyration (Rg) is an indicator of the overall size of RNA. The value of 
Rq, which can be measured using small angle X-ray or neutron scattering, is calculated using 

M M 



where M is the number of atoms in the molecule, and rrii is the mass of the i atom. In the cal- 
culation of Rq for RNA structure we used only the coordinates of the heavy atoms (C, N, O, P). 

Shape : The deviation from the spherical shape is characterized by the asphericity A and 
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the shape parameter S, both of which are calculated from the inertia tensor /^'^^ 

^ M M 

%p = Yl Yl ^i^ji^ic. - rja)irif3 - rj^) 

1 ^' 

= ^^^(ri^ -i?Ca)(ri/3 -i?C7/3) (2) 

where ria is the a-th Cartesian component of the position of atom i, and Rc = YlT "^i^il ' 
is the a center of mass. The square of the radius of gyration is = trT. The eigenvalues 
Ai, A2 and A3 of the matrix T are the the squares of the three principal radii of gyration. The 
extent of asphericity is characterized using A (0 < A < 1) 

" 2 (trTY- ^ ' 

where A = (Ai + A2 + A3)/3. For a perfect sphere A = 0. Deviation from A = indicates the 
extent of anisotropy. The overall shape of a molecule is assessed using 

which satisfies the bound — 1/4 < 5 < 2. Negative values of S correspond to oblate ellipsoids 
and S > are prolate ellipsoids. 

Most studies of packing in proteins and RNAs involve tessellation of space which always 
introduces certain amount of arbitrariness.-^*^ In contrast, the shape parameters A and 5* 
are directly computed using only the atomic coordinates. Knowledges of A and 5* are im- 
portant in determining the overall motion of RNA and their interaction with other biomolecules. 

Persistence Length : A parameter that describes the flexibility of biomolecules is the 
persistence length, Ip, which is most clearly defined by assuming that RNA structures can be 
described by a polymer model. Based on previous experimental studies it is suspected that the 
statistical properties of dsDNA,i2iii ssDNA,^^ and RNA^J^ can be described using the worm- 
like chain (WLC) model. For WLC models Ip can be estimated provided the distribution of the 
mean end-to-end distance Re or Rq is known. Exact calculation of neither P{Re) nor P{Rg) 
is possible for WLC. A simple and accurate theoretical expression has been derived for P{Re) 
of worm-like chain using the mean field approximation . ^^1^^ The resulting distribution, which is 
in good agreement with computer simulations,-^ is 

PwLcirE) = _ ^^P[~ 4(i-r|) ]- 
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where te = Re/L and t = L/lp. L is the contour length. For RNA molecules, which from the 
perspective of polymers, can be viewed as a branched polyelectrolyte chains, the contour length 
is also an unknown parameter. The normalization constant C = l/[7r^/^e~°a~^/^(l + 3a^^ + 
15/4a~^)] with an a = 3t/4. When Ip is small Pwicif^E) reduces to a Gaussian chain whereas 
for large Ip Pwlc{^e) approaches the rod- limit as r^; — > 1. 

Although direct measurements of P{Re) for biomolecules are not routinely performed it is 
conceivable that P{Re) niay be obtained using single molecule FRET experiments. However, 
the distance distribution function P{r) can be measured using SAXS experiments . Based 
on general arguments, we expect that the distribution functions P(r) and P{Re) should coincide 
provided r ^ Rq- Because (Re) ~ {Rg) ~ ^^p WLC provided L is large it follows scaling 
arguments that P(r) should decay for large r as 



computed using the coordinates of RNA structures when r/Rg > 1. We determined Ip by fitting 
the P{r) function for RNA structures to EqlHl 

Recently, we used EqlH] to analyze small angle X-ray scattering data. We showed that Ip 
for the Azoarcus ribozyme changes by a factor of 2 as the molecule folds upon addition of 
counterions (Mg^+ or Na+). Although the structural basis for the success of WLC in describing 
certain properties of folded RNA is unclear, EqlHlis useful in analyzing scattering data. 

For purposes of comparisons we have also calculated P{r) for folded structures for 56,000 pro- 
tein chains. To our knowledge the persistence length of proteins has not been directly measured. 
We obtain Ip by fitting P(r), obtained from the coordinates of the structures in the PDB, to EqlHl 



Distribution of RNA structures as a function of N: From the distribution of P{N) 
the number of RNA structures in the PDB as a function of chain length (A^) in FigQ we find 
that ~ 70% of the database contains in the range 10 < A^ < 30. The peak in P{N) between 
70 < A^ < 80 is due to the large number of tRNA structures that have been determined in 
various conditions. The peaks at A^ ~ 1500 and A^ ~ 3000 correspond to 16S and 23S ribosomal 
RNAs, respectively. Compared to statistics of protein structures (see FigH] inset), RNA 
structures are more clustered at small values of A^ but span a broader range of A^. However, 
this distribution is unrelated to the number of RNA molecules that are relevant to biological 
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where x = IpV / R\ 



and (3 is an arbitrary constant. In practice EqEl accurately describes P(r) 



RESULTS 
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functions. There is a broad range in that represents noncoding RNAs. For example, the 
length of human ncRNA functioning in gene silencing process is ~100,000 nucleotides.— From 
Fig^ which reflects the current status in RNA structure determination, it is clear that there 
is a large gap between the total number of functional RNAs and those with known three 
dimensional structures. 

Size of RNA obeys the Flory law : If the overall shape of RNA is spherical then its 
volume, an extensive variable, is ~ ^-^g with Rq being the radius of gyration. For accurate 
computation of volumes one should use the hydrodynamic radius instead of Rq- Because V ~ 
o?N where a is a characteristic length (approximately the distance separating two consecutive 
nucleotides) it follows that Rg ~ aN^^^. This general result was first derived by Flory who 
showed that Rq ~ aN" where u = 1/3 for maximally compact structures. Because RNA is 
a polyelectrolyte its Rq depends on the concentration of counterions (C). At low values of 
C, RNA is expanded and the transition to a compact structure occurs only when C exceeds a 
critical value. 

We calculated Rg, using EqU] (see Methods), for the 1155 "folded" RNA structures. A plot 
of Rg as a function of confirms the Flory result. From the plot in Fig ]2(a)| we find that, for 
the folded RNA structures, Rg can be accurately calculated using 

Rq = aN^''^ (7) 

where a = 5.5A. The prefactor, a = 5.5A, for the folded structures approximately corresponds 
to the average distance (~5.5A) between the phosphate groups along the backbone (Fig |2(b)] ). 
Recent measurement of Rg for the compact state of the 195 nucleotides Azoarcus ribozyme at 
high concentration of Na~*" or Mg^"*" shows that Rg ~ 35A.^ From EqEI we find Rg ~ 32A. 
This analysis further suggests that the prefactor in EqEJmay indeed be interpreted as the mean 
distance between consecutive phosphate groups in the folded structures. If the Rg data in 
Fig|2(I)1 for N <20 is neglected we find that Eql7| is obeyed with a ^ 5A. Thus, the scaling 
relation is robust. 

It is perhaps more reasonable to view RNA structures as formed from relatively rigid 
duplexes that are linked by flexible motifs such as bulges, loops, etc. In such a picture 
the fraction of base-paired nucleotides can be chosen as a variable to describe the overall 
size. We have shown previously (see Fig. 10 in^^) that the number of base pairs in RNA is 
oc N. Thus, the Flory result would be valid even if one accounts for the rigidity of RNA duplexes. 
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Single-chain RNAs are aspherical and prolate : Even though folded RNA structures 
are compact, as assessed by their size, there are substantial deviations from sphericity. Indeed, 
the distribution -P(A) for single chain RNAs (FiglHl-(a)) has a broad peak around A ^ 0.3. This 
shows that the native-state conformations of single chain RNA molecules deviate greatly from 
a sphere. This finding is in stark contrast to -P(A) in single-chain protein structures where the 
peak of the distribution is at A < 0.1.^- In addition, only ~15% of single-chain RNA structures 
have A < 0.2, while in proteins the corresponding number is ~80%. This analysis shows that 
even if native structures of RNAs are compact {Rg = 5.5A^^/^A) they are highly aspherical. 

Because many RNAs are organized as oligomers, we also obtained the values of A for such 
structures. The distribution of A for oligomeric RNAs is also very broad (Fig 01- (a) middle 
panel). Approximately 34% of the 518 oligomeric RNAs have A < 0.2 which shows that 
oligomer izat ion in RNA increases the sphericity of the molecule. This conclusion is substanti- 
ated by analyzing the Ra^ which is the ratio between the degree of asphericity of the oligomer 
and the average ashpericity of the individual chains. If R^ = 1 then the oligmers and the chains 
have the same asphericity while R/\ < 1 indicates that the oligomer is more spherical than its 
components. Nearly ~60% of oligomeric RNAs have R^ < 1- 

The distribution of the shape parameter, S, in single-chain RNAs (Fig 01- (b) top panel) 
shows that RNA is mostly prolate because most of the chains have 5* > 0. This tendency 
towards prolate shapes is stronger than in proteins where ~50% of single-chains are spherical 
or nearly so.— On the other hand, the complexes of RNA chains found in the PDB structures 
exhibit a bias towards spherical structures as shown in the peak around S* = in FigOKb) 
bottom panel. It should be emphasized that there is no systematic dependence of A or S on N. 
A plot of A and 5 on N shows no correlation whatsoever. The observed variations is directly 
attributable to sequence and hence the topology of the folded structure. 

Distribution function of radius of gyration can be described by WLC model: For 

the database of RNA molecules, we calculated the distance distribution, P(r), using the coor- 
dinates of the heavy atoms. The P{r) functions (Figl^a)) for a few RNA molecules, resemble 
those obtained using SAXS experiments for compact RNA molecules. The value of the persis- 
tence length is obtained by fitting P(r) to EqlHlin the range Rq < r < 2.5Rg- As can be seen 
from FigElthe value of Ip varies between (5-25) A. 

If the WLC model correctly describes the distance distribution function an important 
prediction follows from EqlHl namely, that by replacing r by the dimensionless variable 
X = rip/ R^ all the P(r) curves must coincide for r / Rg > 1. In other words, irrespective of the 
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size, sequence or the nature of interactions that stabihze the native topology, the tail of P{r) 
(r > Rg) should superimpose. Thus, P(r) should be a function of only Ipv/RQ. This important 
prediction is validated in FiglH^b) in which a plot of P{x) with x = rlp/R^ shows that all the 
structures follow the same functional form for x > 0.5 (see^^ for the same analysis performed on 
the end-to-end distance distribution of DNA). From this result we conclude that the distance 
distribution function of RNA structures are well described by the WLC model. We do not have 
any structural basis for this observation. 

Persistence length increases with N: It is remarkable that P(r) for folded RNA is well 
described by the WLC model which accounts only for the bending penalty of a thin elastic 
material. The structural basis for this important finding is not clear. By fitting P(r) to EqlHlfor 
r / Rg > 1 we find that Ip for folded structures increases with N. The finding that Ip grows as 
Ip = 1.5A^" with a 1/3 can be rationalized using the arguments given below. A consequence 
of the sublinear growth of Ip with is that the effective contour length for folded RNA must 
also grow sublinearly with A^, i.e., L^fj = 3 x (^^rw) ^^^^ ~ 60A^^/^A. In the unfolded state we 
expect the contour length L oc N. Interestingly, recent single molecular measurements have also 
shown that Ip for microtubules depends on the contour length.— 

The increase in Ip with A^ is related to the restriction that the folded states of biomolecules 
be conformationally less dynamic than unfolded states. It is known from polymer physics that 
if Ip is fixed and there are no interactions that stabilize a specific structure then on large scales 
Ip) the structure would be intrinsically flexible. This would mean that spontaneous global 
fluctuations of folded RNA would be highly likely due to increase in conformational entropy. 
The requirement that biomolecules should adopt a near unique native fold which minimizes 
entropy in the native basin of attraction (NBA), implies that Ip itself should grow with A^. In 
contrast, for unfolded RNA, whose conformational entropy is greater than the structures in the 
NBA, we expect that Ip should be independent of A^ (see Appendix). 

The persistence length Ip, which determines the flexibility of RNA, depends on the concen- 
tration, shape, and size of counterions. The balance of the effective energetics of interactions 
(stacking interactions, hydrogen bonding, hydrophobic interaction, and repulsion between phos- 
phate groups and tertiary interactions) renormalizes Ip. Let us assume that the interactions are 
approximated as pairwise additive and short-ranged AG ^ J2\fi~fj\<Ra ^^ij- presence 
of these interactions the persistence length should scale as the range of the interactions i.e., 
Ip^ Rg ^ AtVs. rj.^^ 

non-local interactions, which stabilize the folded RNA structures, grow 
with N and hence affect L. In the absence of interactions that stabilize the three dimensional 
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fold Ip is determined only by the intrinsic property of primary sequence and hence should not 
depend on (see Appendix). 

We further rationalize the dependence of Ip on N by noting that about 54% of all nucleotides 
in folded RNA structures are involved in base pairing (see Fig. 10 in^''). One possible way, 
independent of N, of achieving the 54% base pairings is to distribute them over several short 
duplexes that are stabilized by tertiary interactions in the native state. Because the tertiary 
interactions in RNA are weaker than the base stackings (and other) interactions that stabilize 
hairpin-like structures, creation of several short duplexes is not favorable. Alternatively, it is 
free energetically more favorable to create a smaller number of longer stable rigid duplexes 
that are stabilized by tertiary interactions to create a nearly spherical shape. This strategy 
seems to operate as N increases as seen in ribosomes. As a consequence of the presence of large 
number of rigid duplexes, which reflects the hierarchical nature of RNA assembly, Ip increases 
with N. In other words, in RNA there is clear separation in energy scales stabilizing secondary 
and tertiary interactions. Such a hierarchy implies that stiffness itself must be dependent on N. 
Because such clear separation in structural organization does not exist in proteins we expect 
that Ip in proteins must weaker dependence on N (Fig. Ej). A similar reasoning has been give 
to explain the growth of Ip with for microtubules.-^ 

DISCUSSION 

Differences in shapes and packing between proteins and RNA: It is difficult to com- 
pare, in absolute terms, packing in proteins and RNA because the nature of interactions that 
stabilize their native structures are distinct.-^ Nevertheless, the Flory scaling {Rq ~ aN^^^) ob- 
served in RNA and proteins shows that both are maximally compact. For a given A^, the approx- 
imate volume of RNA is larger than proteins. The ratio, Vrna/Vprot ~ {clrna/O'ProtY ~ 5.6 
for a fixed N . This suggests that, in all likelihood, RNA is more loosely packed than proteins 
— a conclusion that is in apparent contradiction with a recent structural analysis.— Voss and 
Gerstein based their conclusion on Voronoi construction to decipher volumes of RNA and spe- 
cific volume calculations. They concluded that "based on well packed atoms" RNA is more 
tightly packed than proteins.— The inherent arbitrariness in assigning volumes to atoms based 
on Voronoi tessellation of space and the use of mass in the definition of specific volume obscures 
packing effects which should be based on sizes of nucleotides alone. The present computations 
show that, based on volume fraction considerations, RNA is not as compact as proteins as long 
as (the number of nucleotides or the number of aminoacids) is fixed. 
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The observed differences between shapes of RNA and proteins are primarily due to the 
nature of interactions that stabihze the folded structures of RNA and proteins. Tertiary 
structure formation in RNA must be preceded by substantial neutralization of the negative 
charges of phosphate groups. Condensation of counterions that are non-specifically bound 
results in the residual charge on the phosphate group being less than ~ — O.le where e is 
the charge of the electron. However, packing in the resulting tertiary fold is determined not 
only by interactions involving nucleotides but also by correlations between counterions.™ The 
condensation of a large number of counterions needed to neutralize the charges on the phosphate 
groups results in spatial correlation between them. If the volume excluded by the counterions 
is large (for example the volume of cobalt hexamine is greater than that of Mg^"*") then binding 
of one counterion prevents another one being spatially adjacent. These counterion- mediated 
interactions and their correlation also inherently affect packing in RNA. In contrast, packing 
in the core of proteins is predominantly determined by interactions between hydrophobic side 
chains and their contacts with the protein backbone. Because of the absence of additional 
ligands, except in certain cases like heme proteins, dense packing in proteins is easier to achieve. 

Shape fluctuations of proteins and RNA in the ribosome: The analysis of shape 
and flexibility of isolated proteins and RNA gives insight into packing in isolated biomolecules. 
However, in a vast majority of cases, function requires interactions between two or more com- 
ponents. A prime example is the ribosome, a ribonucleoprotein complex, that plays a central 
role in protein synthesis.— Complexes of both small and large subunits with various an- 
tibiotics have revealed the mechanism of the ribosomal machinery for tRNA recognition and 
protein synthesis.-^'^^'^^'^ The remarkable three dimensional map of entire ribosome (70S) in- 
cluding three tRNAs and mRNA that shows a snapshot of the translation process, has also 
been resolved by cryo-EM techniques at 5.5A resolution.^^ The binding interface between 308 
and 508 subunits, tRNA recognition site in 308 subunit, and peptidyl transferase site on 508 
subunit are all devoid of the ribosomal proteins. The cavity is formed at the interface between 
two subunits where three tRNA and a string of mRNA can be accommodated. The structures of 
~ 50 ribosomal proteins have also been investigated, giving further insights into the interaction 
and the assembly process of the ribosome . 

Comparison of the shapes of the structures in isolation and in the complex allows us to infer if 
there are large scale shape changes upon complexation. To this end, we analyzed the individual 
components of the ribosome as well as each structural domain by using the parameters that 
quantify molecular sizes, shapes, and flexibilities of the individual components. We used the 
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atomic coordinates from IGIX (308 subunit composed of 16S rRNA, 3 tRNA, 1 mRNA, and 
20 r-proteins) and IGIY (508 subunit composed of 23S rRNA, 5S rRNA and 22 r-proteins) 
that form an entire ribosome complex upon combination-^^ The parameters characterizing the 
structural components of ribosome are summarized in Table|l] 

r-RNAs : Each ribosomal RNA (16S, 23S rRNA) can be further decomposed into sev- 
eral structural domains whose folding is autonomous even in the absence of ribosomal 
proteins.— i^i'^^'^ The structural features of individual domains of rRNAs in Fig |6(a)[ |6(b)| are 
quantified in terms of Rg-, A, and S, with corresponding regions differently colored in the sec- 
ondary structure map. Comparison of A and 5* values of rRNA domains (Tablejl} with -P(A) 
and P{S) in FiglHl shows that, except for the 3'm domain of 16S rRNA, the overall shapes of 
rRNA domains are nearly-spherical and slightly prolate (0 < 5 < 0.25). Thus, no significant 
difference between the overall shape is found in rRNAs domain in comparison to typical RNA 
molecules. However, the deviations of Rq from the scaling law (EqlTj), especially for the do- 
mains of 23S rRNA, II, IV, V, VI, show that they are more extended in size than normal RNA 
(Fig [7(a)| ). We find that the size of the domains in the 16S rRNA, 5', C, 3'M obeys the scaling 
law (EqlH). 

Because the shape of the fold from each domain is identical to the one assembled in the intact 
ribosome, the assembly from extended domains must occur by a jigsaw puzzle type matching. 
The head part of the 16S rRNA, which is crucial for A, P, E, tRNA binding sites is entirely 
composed of the 3'M domain. The 5' and C domains comprise the body and the platform 
part, respectively (see^^ for terminology). 3'm domain lies at the interface and interacts with 
IV-domain of the 23S rRNA when the two subunits dock. After the rRNA domains and r- 
proteins are assembled to form a functional subunit, 50S subunit is highly spherical (A = 0.05, 
5* = —0.01). In contrast, the SOS subunit is aspherical and prolate (A = 0.21, S = 0.14). The 
acquisition of the spherical shape of the entire ribosome (A = 0.03, S = 0.01) must occur after 
the folding of two subunits. Comparison of the shape of 30S, 50S, and 70S particles suggests 
that there is very little alteration in their respective A and S values upon complexation. This 
observation suggests that these domains probably fold prior to assembly. 

Despite their large sizes, the 50S and the 70S particles are considerably more spherical than 
the majority of RNA molecules. The globular nature of the 50S particle and the 70S complex is 
surprising given that the typical RNA complexes are aspherical. This asphericity, especially for 
medium-sized RNA, is the result of coaxial stacking of helices found in the secondary structures. 
The stacking leads to formation of long helices which are expected to be rigid with large values 
of Ip. The 30S subunit, which is highly aspherical and prolate, fits well with this expectation. 
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NoUer has pointed out that the ribosome is made up of mostly small helices linked by flexible 
bulges and loops.— This observation applies to the SOS subunit (Fig |6(b)1 ). However, large-sized 
coaxial stackings are dominant in the 16S rRNA, but not in the 23S rRNA. As a result, the 308 
subunit is highly aspherical. The 70S complex is highly spherical. The globularity of the 70S 
arises because the SOS subunit fits precisely (despite its high A and S values (see TableU))) at 
the interface with the SOS to create a nearly perfect sphere. 

r-proteins : Similar quantitative analysis can be performed on the ribosomal proteins. The 
values of Rg in some r-proteins deviate from the scaling law and the shape is generally more 
biased to the prolate shape than in the non-ribosomal proteins (FigCj). Ribosomal proteins 
are mostly distributed on the back of the interface and the periphery of rRNAs with some of 
proteins being anchored deep into the crevices of rRNA. The anchoring is accomplished using 
the long tail of peptide chain composed of positively charged amino acids (ARG, LYS, HIS) .^^i^^ 
The unusual topology of r-proteins prompted us to investigate whether or not the r-proteins 
maintain their shape in isolation. We compared the structure of 16 r-proteins complexed in the 
ribosome ribosome with the isolated r-protein structures independently determined by X-ray or 
NMR available in PDB. The structural deviation between the isolated and ribosome-complexed 
r-proteins is quantified using root mean square deviation (RMSD). The structured domains, 
like a-helix and (3-sheet, are well matched in the isolated protein and in the complex, but the 
structural deviation is large in the loop and the tail regions of the structure. The structure 
comparison suggests that the ordered part of the r-protein is at least well conserved in both 
situations. The disordered tail part is stabilized upon complex formation inside the crevices of 
rRNA.M 

CONCLUSIONS 

In this paper we have shown, by analyzing the available RNA structures, that Rg can be 
accurately computed using the celebrated Flory law. In contrast to proteins, RNA molecules 
are considerably more aspherical with the overall shape being prolate. The prolate nature of 
RNA shapes suggests that their diffusion is intrinsically anisotropic. For a given value of N (the 
number of nucleotides or amino acids) the persistence length of RNA is considerably larger than 
proteins. These findings suggest that typically RNA is not nearly as densely packed as proteins 
even though both are compact in the folded states. 

The structural basis for the success of WLC model in quantitatively fitting the distance 
distribution curves for proteins and RNA is not clear. It has been appreciated for a long 
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time that elasticity-based models are appropriate for ds-DNA in monovalent counterions. The 
present findings that P{r) (for r/Ra > 1) for compact RNA and proteins can be described using 
polymer models that accounts only for bending energies is surprising. Our work shows that Ip, 
which is needed to describe interaction between biomolecules, can be accurately obtained using 
the experimentally measurable -P(r). The fit of P(r) to WLC also shows that Ip increases with 
N. Such an unusual behavior is, perhaps, related to the need to minimize entropic fiuctuations 
in the native state. Suppression of conformational fluctuations in long RNA can achieved by 
having a small number of long rigid helices that are stabilized by weak tertiary interactions. 
Despite the success of the polymer-based analysis of RNA structures of varying complexity the 
microscopic basis for characterizing for folded biomolecules using WLC model remains to be 
established. 

APPENDIX 

The observation that the persistence length of RNA in the compact folded states increases as 
Ip ~ aiN^'^ with ai ~ l.sA was rationalized in terms of the restricted conformational fluctuations 
in the native state. A corollary of this interpretation is that Ip should become independent of N 
(or the sequence) if RNA is in the unfolded state. In this appendix, we adopt an oversimphfled 
model for the unfolded state of RNA to explicitly show that at large {N > 40) Ip indeed does 
not depend on N. 

The absence of persistent tertiary structure allows us to describe the polynucleotide chain as a 
worm- like chain model. Such a coarse-grained description may be an approximate representation 
of a single stranded chain made up of one nucleotide (for example polyA). To verify how Ip 
changes as N increases we have performed simulations using WLC which takes into account 
only the excluded volume interactions between the beads representing the nucleotides. The 
energy function is 

N-l ^ N-2 N-2 N ^ 

H^Y^ -^{n,i+i -af + Y^ fca(l - ri,i+i ■ fi+i,i+2) + y (^^-i ~ o.f^{a - rij) (8) 

i=l 1=1 i=l j=i+2 

where rj^, r^j are distance and unit vector between i and j beads, respectively. The flrst term 
restricts the extension (or reduction) of bond length around a with kb — 20006/ a'^ where e is the 
unit of energy. The second term is the bond angle potential that prohibits signiflcant deviation 
from the equihbrium value. We assign ka = lOe. The last term with kg — 2000e/a^ takes into 
account volume exclusion interaction. By construction, the homopolymer WLC cannot form 
any preferred low energy compact structures. 
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For this model, whose energy function is given by Eq|H[ we obtained the end-to-end distance 
(Re) distribution function using Monte Carlo simulations. Using the energy function in EqlHl we 
generated a large number of equilibrium conformations of the WLC model by employing the pivot 
algorithm.— Unlike a standard Monte Carlo methods that generates polymer conformations by 
moving each monomer the pivot algorithm produces a global change in the configuration by 
pivoting the chain around the randomly selected monomer position at each iteration. The 
algorithm enhances the sampling rate of the available conformational space. The acceptance is 
judged by Metropolis criterion. 

From the ensemble of conformations generated using the pivot algorithm we obtained the 
end-to-end distribution function P{Re)- The simulated distribution function P{Re) can be fit 
using EqISl from which we obtain Ip. The dependence of Ip on N for the WLC, without the 
possibility of forming ordered structures, shows (FiglH]) that Ip becomes independent of when 

> 40. The rise in Ip for < 40 is due to the domination of the bending energy (second term 
in EqlS)). For larger values of the entropic contributions can compensate for the bending 
energy and Ip saturates to its intrinsic value. Thus, for WLC with excluded volume interactions 
the bending penalty dominates at small A^ values and the chain is intrinsically flexible when A^ 
is very large. This situation is in stark contrast with folded RNA (or proteins) where Ip grows 
with A^. The increase of Ip as A^ increases, which is due to interactions that stabilize RNA, 
is required to suppress conformational fluctuations when biomolecules reach the functionally 
competent state. Similar findings are well known for polypeptides such as polyPro, polyGly, 
etc.i^ 
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TABLE I: Structural features of the ribosome. 
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"TV is the number of nucleotides or aminoacids. 
^The radius of gyration Rq is calculated using Eq^] 
'^The shape parameters A and S are computed using Eq|2| El 
^Ip is the persistence length. 

•^The root mean square deviation is the extent of structural deviation of the ribosomal proteins in the complex 
and in isolation. 

•''Persistence length is not reported if the correlation coefficient of nonlinear fitting is less than 0.85. 
^Unusually large values of the parameters (A, S > 0.6, and Ip > T.OA) are given in bold. 
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FIGURE CAPTIONS 

Figure ^ Distribution of RNA structures in the Protein Data Bank (PDB) as a function of 
chain length, N. The arrows show the N values for 16S and 23S ribosomal RNAs, respectively. 
The inset shows the same plot for protein structures. 

Figure 121 (a) Radius of gyration as a function of N. The straight line is a fit to the data that 
shows the scaling law Rq = 5.5N^'^^A. The correlation coefficient if 0.94. If data for N > 300 are 
neglected we found Rq = 5.6N^'^^ with a correlation coefficient of 0.92 (fit in green). Data points 
inside the circle, which deviate significantly from the scaling law, correspond to the structures 
that are similar to ds-DNA (PDB code: IHIK). We excluded these structure from the fitting 
procedure. For comparison the plot of Rg as a function of N for 13704 monomeric proteins are 
shown in the inset. The linear line corresponds to -Rg = 3.1A°'^^A with a correlation coefficient 
of 0.89. (b) Distance distribution of neighboring phosphor atoms along the RNA backbone. The 
distance, Rp-p corresponds to separation of the backbone P atoms between i^^ and {i + l)*'^ 
nucleotide where i = 1, 2, . . . , (A — 1). 

Figure El (a) Distribution of the asphericity parameter A for RNA. The top panel corre- 
sponds to single chain, the middle represents single chain in a complex, and the bottom panel 
is for the complex. Large deviation from sphericity is found in RNA. (b) Distribution of shape 
parameters for RNA. The legend for the three panel is the same as in (a). RNA molecules in 
general are aspherical and prolate like an American football. 

FigureEl (a) The distance distribution P{r) as a function of r for selected proteins and RNA. 
We calculated P(r) using the coordinates of the folded structures. The legend at the bottom 
gives the PDB codes for which P(r)s are shown, (b) Dependence of P(r) on the dimensionless 
variable x = rlp/R^. If RNA and proteins can be modeled as WLC then it follows that, for 
X > 1, P{x) should fall on a single line (See EqlHI) independent of the fold. The tails of P{x) 
for P(r) in (a) practically collapse onto a single curve. The log P(r)//5 distributions between 
dash lines are plotted as a function of —j^, which show a nice overlap with the condition, 
~ 1, being satisfied. 

Figure jSl Dependence of Ip on the chain length for RNA and proteins. The persistence 
length Ip was computed by fitting P(r) to EqlHl The lines correspond to Ip = 1.47A'^'^^A (RNA) 
and Ip = 1.00A*^'^^A (proteins). There is greater dispersion in the data for proteins than for 
RNA. Indeed, the correlation coefficient in the fit for RNA is 0.98 whereas for proteins it is only 
0.79. Nevertheless, the Ip values for proteins are in the range inferred from experiments for both 
peptides and proteinsl.i^^^ 

Figurejni (a) Structural domains of the 16S rRNA. The corresponding secondary structure at 
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the center is in the same color. View from interface (left) and back (right) of 16S rRNA assembled 
by these structural domains (b) Structural domains of the 23S rRNA. The organization of the 
figure is identical to that of 16S rRNA in (a). The coaxial stackings, are specified as dark lines 
on the secondary structures. Molecular graphics images were produced using XRNA and UCSF 
Chimera package.— 

Figure |7| (a) Radii of gyration (Rg) of the structural domains in 16S (filled circle) and 23S 
(empty diamond) rRNAs are plotted as a function of A^. Red line representing Rg = 5.5N^'^^ 
is drawn to show the deviation of rRNA domain from the statistics found in usual RNAs. (b) 
Plot of Rg against A^ for ribosomal proteins. Red line represents Rg = S.IA^^'''^ scaling law 
found in "normal" globular proteins. Ribosomal proteins (L3, L4, L9, SIO, S12, S13, S20) that 
show a large deviation from the scaling law are explicitly indicated. When the tail part of these 
proteins are removed, Rg for the r-proteins obey the Flory scaling law (see open red circles). 

Figure |HJ Persistence length Ip as a function of A^ for a WLC model described in the 
Appendix. This model may represent a homopolymeric nucleotide at low salt concentrations. 
The value of Ip is obtained by fitting the end-to-end distribution functions P{Re) that were 
generated by Monte Carlo simulations (see Appendix). An example of P{Re) as a function of 
Re/L for A^ = 30 is shown in the inset. The dependence of Ip in A^ shows that, for large A^, Ip 
is a constant for a homopolymer chain at low ionic concentration. 
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